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Description 

BIOINFORMATICALLY 
DETECTABLE GROUP OF 
NOVEL REGULATORY 
GENES AND USES 
THEREOF 

APPENDIX DATA 

[0001] Sequence Listing named: SEQ_LIST.txt, 
size: 210838KB, created: Aug-29-2003, 
and containing genomic sequences 
described by the present invention, herein 
incorporated by reference. 

[0002] 3 Q j: 5 compact discs containing Large 

Tables. Compact Disc named Large Tables 



1 contains file: Tablel.txt (size: 12286KB, 
created: Aug-26-2003), Large Tables 2 
contains file: Table2.txt (size: 646904KB, 
created: Aug-26-2003), Large Tables 3 
contains files: Table3.txt (size: 1 781 44KB, 
created: Jan-05-2005), Table4.txt (size: 
36689KB, created: Jan-05-2005), and 
Table5.txt (size: 6207KB, created: Jan-05- 
2005). These tables correspond to five 
tables of the present invention, Table 1, 
Table 2, Table 3, Table 4 and Table 5 
respectively. 

BACKGROUND OF INVENTION 
FIELD OF THE INVENTION 

[0003] 

The present invention relates to a group of 
bioinformatically detectable novel genes, 
here identified as "genomic address 



messenger"or "GAIvTgenes, which are 
believed to be related to the micro RNA 
(miRNA) group of genes. 

DESCRIPTION OF PRIOR ART 

[0004] Small RNAs are known to perform diverse 
cellular functions, including post- 
transcriptional gene expression regulation. 
The first two such RNA genes, Lin-4 and 
Let-7, were identified by genetic analysis 
of Caenorhabditis Elegans (Elegans) 
developmental timing, and were termed 
short temporal RNA (stRNA) (Wightman, B., 
Ha, I., Ruvkun, C, Cell 75, 855 (1993); 
Erdmann, V.A.. et al., Nucleic Acids Res. 
29, 1 89 (2001); Lee, R. C, Feinbaum, R. L, 
Ambros, V., Cell 75, 843 (1993); Reinhart, 
B. et al., Nature 403, 901 (2000)). 
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[0005] 

Lin-4 and Let-7 each transcribe a -22 
nucleotide (nt) RNA, which acts a post 
transcriptional repressor of target mRNAs, 
by binding to elements in the 3"- 
untranslated region (UTR) of these target 
mRNAs, which are complimentary to the 22 
nt sequence of Lin-4 and Let-7 
respectively. While Lin-4 and Let-7 are 
expressed at different developmental 
stage, first larval stage and fourth larval 
stage respectively, both specify the 
temporal progression of cell fates, by 
triggering post-transcriptional control over 
other genes (Wightman, B., Ha, I., Ruvkun, 
C, Cell 75, 855 (1 993); Slack et al., 
Mol.Cell 5 ,659 (2000)). Let-7 as well as its 
temporal regulation have been 
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demonstrated to be conserved in all major 
groups of bilaterally symmetrical animals, 
from nematodes, through flies to humans 
(Pasquinelli, A., et al. Nature 408 ,86 
(2000)). 

[0006] 

The initial transcription product of Lin-4 
and Let-7 is a ~60-80nt RNA, the 
nucleotide sequence of the first half of 
which is partially complimentary to that of 
its second half, therefore allowing this RNA 
to fold onto itself, forming a "hairpin 
structure". The final gene product is a 
~22nt RNA, which is "diced" from the 
above mentioned "hairpin structure", by an 
enzyme called Dicer, which also apparently 
also mediates the complimentary binding 
of this ~22nt segment to a binding site in 
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the 3" UTR of its target gene. 

[0007] 

Recent studies have uncovered 93 new 
genes in this class, now referred to as 
micro RNA or miRNA genes, in genomes of 
Elegans, Drosophilea, and Human (Lagos- 
Quintana, M., Rauhut, R., Lendeckel, W., 
Tuschl, T., Science 294 ,853 (2001); Lau, 
N.C., Lim, L.P., Weinstein, E.G., Bartel, D.P., 
Science 294 ,858 (2001); Lee, R.C., 
Ambros, V., Science 294 ,862 (2001). Like 
the well studied Lin-4 and Let-7, all newly 
found MIR genes produce a ~60-80nt RNA 
having a nucleotide sequence capable of 
forming a "hairpin structure". Expressions 
of the precursor ~60-80nt RNA and of the 
resulting diced ~22nt RNA of most of these 
newly discovered MIR genes have been 
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detected. 

[0008] 

Based on the striking homology of the 
newly discovered MIR genes to their well- 
studied predecessors Lin-4 and Let-7, the 
new MIR genes are believed to have a 
similar basic function as that of Lin-4 and 
Let-7: modulation of target genes by 
complimentary binding to the UTR of these 
target genes, with special emphasis on 
modulation of developmental control 
processes. This is despite the fact that the 
above mentioned recent studies did not 
find target genes to which the newly 
discovered MIR genes complementarily 
bind. While existing evidence suggests that 
the number of regulatory RNA genes "may 
turn out to be very large, numbering in the 
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hundreds or even thousands in each 
genome", detecting such genes is 
challenging (Ruvkun C, "Perspective: 
Glimpses of a tiny RNA world", Science 
294 ,779 (2001)). 

[0009] 

The ability to detect novel RNA genes is 
limited by the methodologies used to 
detect such genes. All RNA genes 
identified so far either present a visibly 
discernable whole body phenotype, as do 
Lin-4 and Let-7 (Wightman et. al., Cell 75, 
855 (1 993); Reinhart et al., Nature 403, 
901 (2000)), or produce significant enough 
quantities of RNA so as to be detected by 
the standard biochemical genomic 
techniques, as do the 93 recently detected 
miRNA genes. Since a limited number 
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clones were sequenced by the researchers 
discovering these genes, 300 by Bartel and 
1 00 by Tuschl (Bartel et. al., Science 
294 ,858 (2001); Tuschl et. al., Science 
294 ,853 (2001 )), the RNA genes found 
can not be much rarer than 1 % of all RNA 
genes. The recently detected miRNA genes 
therefore represent the more prevalent 
among the miRNA gene family. 

[001 0] Current methodology has therefore been 
unable to detect RNA genes which either 
do not present a visually discernable whole 
body phenotype, or are rare (e.g. rarer 
than 0.1% of all RNA genes), and therefore 
do not produce significant enough 
quantities of RNA so as to be detected by 
standard biochemical technique. 
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SUMMARY OF INVENTION 

[001 l]The present invention relates to a novel 
group of regulatory, non-protein coding 
genes, which are functional in specifically 
inhibiting translation of other genes, some 
of which are known to be involved in 
various diseases. Each gene in this novel 
group of genes, here identified as "GAM"or 
"Genomic Address Messengers", 
specifically inhibits translation of one of 
more other "target" genes by means of 
complimentary hybridization of a segment 
of the RNA transcript encoded by GAM2, to 
an inhibitor site located in the 
3"untranslated region of the mRNA of the 
one or more "target" genes. 

[0012] 

In various preferred embodiments, the 
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present invention seeks to provide 
improved method and system for specific 
modulation of expression of specific 
known "target" genes involved in 
significant human diseases, and improved 
method and system for detection of 
expression of these target genes. 

[0013] 

Accordingly, the invention provides several 
substantially pure DNAs (e.g., genomic 
DNA, cDNA or synthetic DNA) each 
encoding a novel gene of the GAM group 
of gene, vectors comprising the DNAs, 
probes comprising the DNAs, a method 
and system for selectively modulating 
translation of known "target" genes 
utilizing the vectors, and a method and 
system for detecting expression of known 
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"target" genes utilizing the probe. 

[0014] 

By "substantially pure DNA" is meant DNA 
that is free of the genes which, in the 
naturally-occurring genome of the 
organism from which the DNA of the 
invention is derived, flank the genes 
discovered and isolated by the present 
invention. The term therefore includes, for 
example, a recombinant DNA which is 
incorporated into a vector, into an 
autonomously replicating plasmid or virus, 
or into the genomic DNA of a prokaryote 
or eukaryote at a site other than its natural 
site; or which exists as a separate molecule 
(e.g., a cDNA or a genomic or cDNA 
fragment produced by PCR or restriction 
endonuclease digestion) independent of 
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other sequences. It also includes a 
recombinant DNA which is part of a hybrid 
gene encoding additional polypeptide 
sequence. 

[001 5] "Inhibiting translation" is defined as the 
ability to prevent synthesis of a specific 
protein encoded by a respective gene, by 
means of inhibiting the translation of the 
mRNA of this gene. "Translation inhibiter 
site" is defined as the minimal DNA 
sequence sufficient to inhibit translation. 

[0016] T p iere js t ^ us p rov j C j ec j jn accordance with 

a preferred embodiment of the present 
invention a bioinformatically detectable 
novel gene encoding substantially pure 
DNA wherein: RNA encoded by the 
bioinformatically detectable novel gene is 
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about 1 8 to about 24 nucleotides in 
length, and originates from an RNA 
precursor, which RNA precursor is about 
50 to about 1 20 nucleotides in length, a 
nucleotide sequence of a first half of the 
RNA precursor is a partial inversed- 
reversed sequence of a nucleotide 
sequence of a second half thereof, a 
nucleotide sequence of the RNA encoded 
by the novel gene is a partial inversed- 
reversed sequence of a nucleotide 
sequence of a binding site associated with 
at least one target gene, the novel gene 
cannot be detected by either of the 
following: a visually discernable whole 
body phenotype, and detection of 99.9% of 
RNA species shorter than 25 nucleotides 
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expressed in a tissue sample, and a 
function of the novel gene is 
bioinformatically deducible. 

[0017] 

There is further provided in accordance 
with another preferred embodiment of the 
present invention a bioinformatically 
detectable novel gene encoding 
substantially pure DNA wherein: RNA 
encoded by the bioninformatically 
detectable novel gene includes a plurality 
of RNA sections, each of the RNA sections 
being about 50 to about 1 20 nucleotides 
in length, and including an RNA segment, 
which RNA segment is about 1 8 to about 
24 nucleotides in length, a nucleotide 
sequence of a first half of each of the RNA 
sections encoded by the novel gene is a 
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partial inversed-reversed sequence of 
nucleotide sequence of a second half 
thereof, a nucleotide sequence of each of 
the RNA segments encoded by the novel 
gene is a partial inversed-reversed 
sequence of the nucleotide sequence of a 
binding site associated with at least one 
target gene, and a function of the novel 
gene is bioinformatically deducible from 
the following data elements: the nucleotide 
sequence of the RNA encoded by the novel 
gene, a nucleotide sequence of the at least 
one target gene, and function of the at 
least one target gene. 

[001 8] 

There is still further provided in 
accordance with another preferred 
embodiment of the present invention a 
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bioinformatically detectable novel gene 
encoding substantially pure DNA wherein: 
RNA encoded by the bioinformatically 
detectable novel gene is about 1 8 to about 
24 nucleotides in length, and originates 
from an RNA precursor, which RNA 
precursor is about 50 to about 1 20 
nucleotides in length, a nucleotide 
sequence of a first half of the RNA 
precursor is a partial inversed-reversed 
sequence of a nucleotide sequence of a 
second half thereof, a nucleotide sequence 
of the RNA encoded by the novel gene is a 
partial inversed-reversed sequence of a 
nucleotide sequence of a binding site 
associated with at least one target gene, a 
function of the novel gene is modulation of 
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expression of the at least one target gene, 
and the at least one target gene does not 
encode a protein. 

19] There is additionaly provided in 
accordance with another preferred 
embodiment of the present invention A 
bioinformatically detectable novel gene 
encoding substantially pure DNA wherein: 
the bioinformatically detectable novel gene 
does not encode a protein, RNA encoded 
by the bioinformatically detectable novel 
gene is maternally transferred by a cell to 
at least one daughter cell of the cell, a 
function of the novel gene includes 
modulation of a cell type of the daughter 
cell, and the modulation is 
bioinformatically deducible. 
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[0020] There is moreover provided in accordance 
with another preferred embodiment of the 
present invention a bioinformatically 
detectable novel gene encoding 
substantially pure DNA wherein: the 
bioinformatically detectable novel gene 
does not encode a protein, a function of 
the novel gene is promotion of expression 
of the at lease one target gene, and the at 
least one target gene is bioinformatically 
deducible. 

[0021] 

Further in accordance with a preferred 
embodiment of the present invention the 
function of the novel gene is 
bioinformatically deducible from the 
following data elements: the nucleotide 
sequence of the RNA encoded by the 
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bioinformatically detectable novel gene, a 
nucleotide sequence of the at least one 
target gene, and a function of the at least 
one target gene. 

[0022] Still further in accordance with a preferred 
embodiment of the present invention the 
RNA encoded by the novel gene 
complementarily binds the binding site 
associated with the at least one target 
gene, thereby modulating expression of 
the at least one target gene. 

[0023] Additionally in accordance with a preferred 
embodiment of the present invention the 
binding site associated with at least one 
target gene is located in an untranslated 
region of RNA encoded by the at least one 
target gene. 
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[0024] Moreover in accordance with a preferred 
embodiment of the present invention the 
function of the novel gene is selective 
inhibition of translation of the at least one 
target gene, which selective inhibition 
includes complementary hybridization of 
the RNA encoded by the novel gene to the 
binding site. 

[0025] Further in accordance with a preferred 

embodiment of the present invention the 
invention includes a vector including the 
DNA. 

[0026] Still further in accordance with a preferred 
embodiment of the present invention the 
invention includes a method of selectively 
inhibiting translation of at least one gene, 
including introducing the vector. 



[0027] Moreover in accordance with a preferred 
embodiment of the present invention the 
introducing includes utilizing RNAi 
pathway. 

[0028] Additionally in accordance with a preferred 
embodiment of the present invention the 
invention includes a gene expression 
inhibition system including: the vector, and 
a vector inserter, functional to insert the 
vector into a cell, thereby selectively 
inhibiting translation of at least one gene. 

[0029] Further in accordance with a preferred 

embodiment of the present invention the 
invention includes a probe including the 
DNA. 

[0030] Still further in accordance with a preferred 
embodiment of the present invention the 



invention includes a method of selectively 
detecting expression of at least one gene, 
including using the probe. 

[0031] Additionally in accordance with a preferred 
embodiment of the present invention the 
invention includes a gene expression 
detection system including: the probe, and 
a gene expression detector functional to 
selectively detect expression of at least 
one gene. 

BRIEF DESCRIPTION OF DRAWINGS 

[0032] Fig.l is a simplified diagram illustrating 
the genomic differentiation enigma that 
the present invention addresses; 

[0033] Figs. 2 through 4 are schematic diagrams 
which when taken together provide an 
analogy that illustrates a conceptual model 



of the present invention, addressing the 
genomic differentiation enigma; 

[0034] Figs. 5A and 5B are schematic diagrams, 
which when taken together illustrate a 
"genomic records" concept of the 
conceptual model of the present invention, 
addressing the genomic differentiation 
enigma; 

[0035] Fig. 6 is a schematic diagram illustrating a 
"genomically programmed cell 
differentiation" concept of the conceptual 
model of the present invention, addressing 
the genomic differentiation enigma; 

[0036] Fig. 7 is a schematic diagram illustrating a 
"genomically programmed cell-specific 
protein expression modulation" concept of 
the conceptual model of the present 



invention, addressing the genomic 
differentiation enigma; 

[0037] Fig. 8 is a simplified diagram illustrating a 
mode by which genes of a novel group of 
genes of the present invention, modulate 
expression of known target genes; 

[0038] Fig. 9 is a simplified block diagram 

illustrating a bioinformatic gene detection 
system capable of detecting genes of the 
novel group of genes of the present 
invention, which system is constructed and 
operative in accordance with a preferred 
embodiment of the present invention; 

[0039] Fig. 10 is a simplified flowchart illustrating 
operation of a mechanism for training of a 
computer system to recognize the novel 
genes of the present invention, which 
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mechanism is constructed and operative in 
accordance with a preferred embodiment 
of the present invention; 

[0040] Fig. 1 1 A is a simplified block diagram of a 
non-coding genomic sequence detector 
constructed and operative in accordance 
with a preferred embodiment of the 
present invention; 

[0041 ] Fig. 1 1 B is a simplified flowchart 

illustrating operation of a non-coding 
genomic sequence detector constructed 
and operative in accordance with a 
preferred embodiment of the present 
invention; 

[0042] Fig. 1 2A is a simplified block diagram of a 
hairpin detector constructed and operative 
in accordance with a preferred 



embodiment of the present invention; 

[0043] Fig. 1 2B is a simplified flowchart 

illustrating operation of a hairpin detector 
constructed and operative in accordance 
with a preferred embodiment of the 
present invention; 

[0044] Fig. 1 3A is a simplified block diagram of a 
dicer-cut location detector constructed 
and operative in accordance with a 
preferred embodiment of the present 
invention; 

[0045] Fig. 1 3 B is a simplified flowchart 

illustrating training of a dicer-cut location 
detector constructed and operative in 
accordance with a preferred embodiment 
of the present invention; 
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[0046] Fig. 1 4A is a simplified block diagram of a 
target-gene binding-site detector 
constructed and operative in accordance 
with a preferred embodiment of the 
present invention; 

[0047] Fig. 1 4B is a simplified flowchart 

illustrating operation of a target-gene 
binding-site detector constructed and 
operative in accordance with a preferred 
embodiment of the present invention; 

[0048] Fig. 1 5 is a simplified flowchart illustrating 
operation of a function & utility analyzer 
constructed and operative in accordance 
with a preferred embodiment of the 
present invention; 

[0049] FIG. 16 is a simplified diagram describing a 
novel bioinformatically detected group of 
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regulatory genes, referred to here as 
Genomic Record (GR) genes, each of which 
encodes an "operon-like" cluster of novel 
miRNA-like genes, which in turn 
modulates expression of a plurality of 
target genes;. 

[0050] Fig. 1 7 is a simplified diagram illustrating 
a mode by which genes of a novel group of 
operon-like genes of the present 
invention, modulate expression of other 
such genes, in a cascading manner; 

[0051 ] Fig. 1 8 is a block diagram illustrating an 
overview of a methodology for finding 
novel genes and operons of the present 
invention, and their respective functions; 

[0052] Fig. 1 9 is a block diagram illustrating 

different utilities of genes of a novel group 



of genes, and operons of a novel group of 
operons, both of the present invention; 

[0053] Figs. 20A and 20B are simplified diagrams, 
which when taken together illustrate a 
mode of gene therapy applicable to genes 
of the novel group of genes of the present 
invention; 

[0054] Fig. 21 A is an annotated sequence of 

EST72223 comprising novel gene GAM24 
detected by the gene detection system of 
the present invention; 

[0055] Figs. 21 Band 21 Care pictures of 

laboratory results, which when taken 
together demonstrate laboratory 
confirmation of expression of the 
bioinformatically detected novel gene 
GAM 2 4 of Fig. 21 A; 



[0056] Fig. 21 D provides pictures of laboratory 
results, which when taken together 
demonstrate further laboratory 
confirmation of expression of the 
bioinformatically detected novel gene 
GAM24 of Fig. 21 A; 

[0057] Fig. 22A is an annotated sequence of an 
EST7929020 comprising novel genes 
GAM23 and GAM25 detected by the gene 
detection system of the present invention; 

[0058] Fig. 22B is a picture of laboratory results, 
which confirm expression of 
bioinformatically detected novel genes 
GAM 2 3 and GAM 2 5 of Fig. 22A; 

[0059] Fig. 22C is a picture of laboratory results, 
which confirm endogenous expression of 
bioinformatically detected novel gene 



GAM 2 5 of Fig. 22A; 

[0060] Fig. 23A is an annotated sequence of an 
EST1 388749 comprising novel gene 
GAM26 detected by the gene detection 
system of the present invention; 

[0061 ] Figs. 23B is a picture of laboratory results, 
which confirm expression of the 
bioinformatically detected novel gene 
GAM 2 6 of Fig. 23A; 

[0062] Figs. 24A through 20625D are schematic 
diagrams illustrating sequences, functions 
and utilities of 20602 specific genes of the 
novel group of genes of the present 
invention, detected using the bioinformatic 
gene detection system described 
hereinabove with reference to Figs. 8 
through 1 5; and 



[0063] Figs. 20626 through 27262 are schematic 
diagrams illustrating sequences, functions 
and utilities of 6636 specific genes of a 
group of novel regulatory "operon-like" 
genes of the present invention, detected 
using the bioinformatic gene detection 
system described hereinabove with 
reference to Figs. 8 through 1 5. 

BRIEF DESCRIPTION OF SEQUENCES 

[0064] 

A Sequence Listing of genomic sequences 
of the present invention designated SEQ 
ID: 1 through SEQ ID: 1 388482 is attached 
to this application, enclosed in computer 
readable form on CD-ROM. The genomic 
listing comprises the following nucleotide 
sequences:Genomic sequences designated 
SEQ ID:1 through SEQ ID:20602 are 
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nucleotide sequences of 20602 gene 
precursors of respective novel genes of the 
present invention;Genomic sequences 
designated SEQ ID:20603 through SEQ 
ID:41204 are nucleotide sequences of 
20602 genes of the present invention; 
andGenomic sequences designated SEQ 
ID:41 205 through SEQ ID:1 388482 are 
nucleotide sequences of 1347200 gene 
precursors of respective novel genes of the 
present invention. 

DETAILED DESCRIPTION 

[0065] Reference is now made to Fig. 1 which is a 
simplified diagram providing a conceptual 
explanation of a genomic differentiation 
enigma, which the present invention 
addresses. 
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[0066] Fig. 1 depicts different cell types in an 

organism, such as CARTILAGE CELL, LIVER 
CELL, FIBROBLAST CELL and BONE CELL all 
containing identical DNA, and deriving 
from the initial FERTILIZED EGG CELL, and 
yet each of these cells expressing different 
proteins, and hence acquiring different 
shape and function. 

[0067] 

The present invention proposes that the 
inevitable conclusion from this constraint 
is, however, strikingly simple: the coding 
system used must be modular. It must 
comprise multiple modules, or records, 
one for each cell-type, and a mechanism 
whereby each cell at its inception is 
instructed which record to open, and 
behaves according to instructions in that 
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record. 

[0068] 

This modular code concept is somewhat 
difficult to grasp, since we are strongly 
habituated to viewing things from an 
external viewpoint. An architect, for 
example, looks at a blueprint of a building, 
which details exactly where each element 
(block, window, door, electrical switch, 
etc.) is to be placed relative to all other 
elements, and then instructs builders to 
place these elements in their designated 
places. This is an external viewpoint: the 
architect is external to the blueprint, which 
itself is external to the physical building, 
and its different elements. The architect 
may therefore act as an "external 
organizing agent": seeing the full picture 



36 



and the relationships between all elements, 
and being able to instruct from the outside 
where to place each of them. 

[0069] Genomics differentiation coding evidently 
works differently, without any such 
external organizing agent: It comprises 
only one smart block (the first cell), which 
is the architect and the blueprint, and 
which continuously duplicates itself, 
somehow knowing when to manifest itself 
as a block and when as a window, door, or 
electrical switch. 

[0070] R e f erence j S now mac j e to Figs. 2 through 
4 which are schematic diagrams which 
when taken together provide an analogy 
that illustrates a conceptual model of the 
present invention, addressing the genomic 
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differentiation enigma. 

] 

Reference is now made to Fig. 2A. Imagine 
a very talented chef, capable of preparing 
any meal provided he is given specific 
written cooking instructions. This chef is 
equipped with two items: (a) a thick recipe 
book, and (b) a small note with a number 
scribbled on it. The book comprises 
multiple pages, each page detailing how to 
prepare a specific meal. The small note 
indicates the page to be opened, and 
therefore the meal to be prepared. The 
chef looks at the page-number written on 
the note, opens the recipe book at the 
appropriate page, and prepares the meal 
according to the written instructions on 
this page. As an example, Fig. 2A depicts a 
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CHEF holding a note with the number 1 2 
written on it, he opens the book on page 
1 2, and since that page contains the recipe 
for preparing BREAD, the CHEF prepares a 
loaf of BREAD. 

[0072] Reference is now made to Fig. 2B, which 
depicts two identical chefs, CHEF A and 
CHEF B, holding an identical recipe book. 
Despite their identity, and the identity of 
their recipe book, since CHEF A holds a 
note numbered 12, and therefore opens 
the book on page 1 2 and prepares BREAD, 
whereas CHEF B holds a note numbered 34 
and therefore opens the book on page 34 
and prepares a PIE. 

[0073] R e f erence j S now made to Fig. 3. Imagine 
the chef of the analogy is also capable of 
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duplicating himself once he has finished 
preparing the specified meal. The format 
of the book is such that at the bottom of 
each page, two numbers are written. When 
he has finished preparing the meal 
specified on that page, the chef is trained 
to do the following: (i) divide himself into 
two identical duplicate chefs, (ii) duplicate 
the recipe book and hand a copy to each of 
his duplicate chefs, and (iii) write down the 
two numbers found at the bottom of the 
page of the meal he prepared, on two 
small notes, handing one note to each of 
his two duplicate chefs. 

[0074] 

Each of the two resulting duplicate chefs 
are now equipped with the same book, and 
have the same talent to prepare any meal, 
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but since each of them received a different 
note, they will now prepare different 
meals. 

[0075] Fig. 3 depicts CHEF A holding a recipe 

book and receiving a note numbered 1 2. 
CHEF A therefore opens the book on page 
1 2 and prepares BREAD. When he is 
finished making bread, CHEF A performs 
the following actions: (i) divides himself 
into two duplicate chefs, designated CHEF 
B and CHEF C, (ii) duplicates his recipe 
book handing a copy to each of CHEF B 
and CHEF C, (iii) writes down the numbers 
found at the bottom of page 1 2, numbers 
34 and 57, on two notes, handing note 
numbered 34 to CHEF B and note 
numbered 57 to CHEF C. 
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[0076] Accordingly, CHEF B receives a note 

numbered 34 and therefore opens the 
recipe book on page 34 and prepares PIE, 
whereas CHEF C receives a note numbered 
57 and therefore opens the book on page 
57 and therefore prepares RICE. 

[0077] It is appreciated that while CHEF A, CHEF B 
& CHEF C are identical and hold identical 
recipe books, they each prepare a different 
meal. It is also appreciated that the meals 
prepared by CHEF B and CHEF C are 
determined CHEF A, and are mediated by 
the differently numbered notes passed on 
from CHEF A to CHEF B and CHEF C. 

[0078] j t - s f urt h er appreciated that the 

mechanism illustrated by Fig. 3 enables an 
unlimited lineage of chefs to divide into 
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duplicate, identical chefs and to determine 
the meals those duplicate chefs would 
prepare. For example, having been 
directed to page 34, when CHEF B divides 
into duplicate chefs (not shown), he will 
instruct its two duplicate chefs to prepare 
meals specified on pages 14 and 93 
respectively, according to the numbers at 
the bottom of page 34 to which he was 
directed. Similarly, CHEF C will instruct its 
duplicate chefs to prepare meals specified 
on pages 21 and 46 respectively, etc. 

[0079] 

Reference is now made to Fig. 4. Imagine 
that the cooking instructions on each page 
of the recipe book are written in shorthand 
format: The main meal-page to which the 
chef was directed by the scribbled note, 
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merely contains a list of numbers which 
direct him to multiple successive pages, 
each specifying how to prepare an 
ingredient of that meal. 

[0080] As an example, Fig. 4 depicts CHEF A of 
FIGS 2 and 3, holding a recipe book and a 
note numbered 1 2. Accordingly, CHEF A 
opens the recipe book on page 1 2, which 
details the instructions for preparing 
BREAD. However, the "instructions" on 
making BREAD found on page 1 2 comprise 
only of 3 numbers, 1 8, 7 and 83, which 
"refer" CHEF A to pages detailing 
preparation of the ingredients of BREAD 
FLOUR, MILK and SALT, respectively. 

[0081 ] As j|| ustratec j j n pig. 4, turning from the 
main "meal page" ( e.g. 1 2) to respective 
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"ingredients pages" (e.g. pages 1 8, 7 & 83) 
is mediated by scribbled notes with the 
page-numbers written on them. In this 
analogy, the scribbled notes are required 
for seeking the target pages to be turned 
to both when turning to main "meal 
pages" (e.g. page 1 2), as well as when 
turning to "ingredient pages" (e.g. pages 
18, 7 & 83). 

[0082] 

The chef in the given analogy, 
schematically depicted in FIGS 2 through 4, 
represents a cell; the thick recipe book 
represents the DNA; preparing a meal in 
the given analogy represents the cell 
manifesting itself as a specific cell-type; 
and ingredients of a meal represent 
proteins expressed by that cell-type. Like 
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the chef equipped with the thick recipe 
book in the given analogy, all cells in an 
organism contain the same DNA and are 
therefore each potentially capable of 
manifesting itself as any cell-type, 
expressing proteins typical of that cell 
type. 

[0083] Reference is now made to Figs. 5A and 5B 
which are schematic diagrams, which when 
taken together illustrate a "genomic 
records" concept of the conceptual model 
of the present invention, addressing the 
genomic differentiation enigma. 

[0084] Q enom j C Records concept asserts that 
the DNA (the thick recipe book in the 
illustration) comprises a very large number 
of Genomic Records (analogous to pages in 
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the recipe book), each containing the 
instructions for differentiation of a 
different cell-type, or developmental 
process. Each Genomic Record is headed 
by a very short genomic sequence which 
functions as a "Genomic Address" of that 
Genomic Record (analogous to the page 
number in the recipe book). At its 
inception, in addition to the DNA, each cell 
also receives a short RNA segment (the 
scribbled note in the illustration). This 
short RNA segment binds complementarily 
to a "Genomic Address" sequence of one of 
the Genomic Records, thereby activating 
that Genomic Record, and accordingly 
determining the ceM"s-fate (analogous to 
opening the book on the page 
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corresponding to the number on the 
scribbled note, thereby determining the 
meal to be prepared). 

[0085] 

Reference is now made to Fig. 5A. a CELL 
is illustrated which comprises a GENOME. 
The GENOME comprises a plurality of 
GENOMIC RECORDS, each of which 
correlates to a specific cell type (for clarity 
only 6 sample genomic records are 
shown). Each genomic record comprises 
genomic instructions on differentiation 
into a specific cell-type, as further 
elaborated below with reference to Fig. 7. 
At cell inception, the CELL receives a 
maternal short RNA segment, which 
activates one of the GENOMIC RECORDS, 
causing the cell to differentiate according 



48 



to the instructions comprised in that 
genomic record. As an example, Fig. 5A 
illustrates reception of a maternal short 
RNA segment designated A" and outlined 
by a broken line, which activates the FIBRO 
genomic record, causing the cell to 
differentiate into a FIBROBLAST CELL. 

[0086] 

Reference is now made to Fig. 5B, which is 
a simplified schematic diagram, illustrating 
cellular differentiation mediated by the 
"Genomic Records" concept. Fig. 5B depicts 
2 cells in an organism, designated CELL A 
and CELL B, each having a GENOME. It is 
appreciated that since CELL A and CELL B 
are cells in the same organism, the 
GENOME of CELL A is identical to that of 
CELL B. Despite having an identical 
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GENOME, CELL A differentiates differently 
from CELL B, due to activation of different 
genomic records in these two cells. In CELL 
A the FIBRO GENOMIC RECORD is activated, 
causing CELL A to differentiate into a 
FIBROBLAST CELL, whereas in CELL B the 
BONE GENOMIC RECORD is activated, 
causing the CELL B to differentiate into a 
BONE CELL. The cause for activation of 
different genomic records in these two 
cells is the different maternal short RNA 
which they both received: CELL A received 
a maternal short RNA segment designated 
A" which activated genomic record FIBRO, 
whereas CELL B received a maternal short 
RNA segment designated B" which 
activated genomic record BONE. 
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[0087] Reference is now made to Fig. 6 which is a 
schematic diagram illustrating a 
"genomically programmed cell 
differentiation" concept of the conceptual 
model of the present invention, addressing 
the genomic differentiation enigma. 

[0088] A cell designated CELL A divides into 2 

cells designated CELL B and CELL C. CELL 
A, CELL B and CELL C each comprise a 
GENOME, which GENOME comprises a 
plurality of GENOMIC RECORDS. It is 
appreciated that since CELL A, CELL B and 
CELL C are cells in the same organism, the 
GENOME of these cells, and the GENOMIC 
RECORDS comprised therein, are identical. 

[0089] As described above with reference to Fig. 
5B, at its inception, CELL A receives a 
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maternal short RNA segment, designated 
A" and marked by a broken line, which 
activates the FIBRO genomic record, 
thereby causing CELL A to differentiate into 
a FIBROBLAST CELL. However, Fig. 6 shows 
further details of the genomic records: 
each cell genomic record also comprises 
two short genomic sequences, referred to 
here as Daughter Cell Genomic Addresses. 
Blocks designated B and C are Daughter 
Cell Genomic Addresses of the FIBRO 
Genomic Record. At cell division, each 
parent cell transcribes two short RNA 
segments, corresponding to the two 
Daughter Cell Genomic Addresses of the 
Genomic Record of that parent cell, and 
transfers one to each of its two daughter 
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cells. CELL A of Fig. 6 transcribes and 
transfers to its two respective daughter 
cells, two short RNA segments, outlined by 
a broken line and designated B" and C", 
corresponding to daughter cell genomic 
addresses designated B and C comprised 
in the FIBRO genomic record. 

[0090] 

CELL B therefore receives the above 
mentioned maternal short RNA segment 
designated B", which binds 
complementarily to genomic address 
designated B of genomic record BONE, 
thereby activating this genomic record, 
which in turn causes CELL B to differentiate 
into a BONE CELL. Similarly, CELL C 
receives the above mentioned maternal 
short RNA segment designated C", which 
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binds complementarily to genomic address 
designated C of genomic record CARTIL., 
thereby activating this genomic record, 
which in turn causes CELL C to 
differentiate into a CARTILAGE CELL 

It is appreciated that the mechanism 
illustrated by Fig. 6 enables an unlimited 
lineage of cells to divide into daughter 
cells containing the same DNA, and to 
determine the cell-fate of these daughter 
cells. For example, when CELL B and CELL 
C divide into their respective daughter cells 
(not shown), they will transfer short RNA 
segments designated D" & E", and F" & G" 
respectively, to their respective daughter 
cells. The cell fate of each of these 
daughter cells would be determined by the 
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identity of the maternal short RNA 
segment they receive, which would 
determine the genomic record activated. 

[0092] Reference is now made to Fig. 7 which is a 
schematic diagram illustrating a 
"genomically programmed cell-specific 
protein expression modulation" concept of 
the conceptual model of the present 
invention, addressing the genomic 
differentiation enigma. 

r00931 

Cell A receives a maternal short RNA 
segment designated A", which activates a 
genomic record designated FIBRO, by anti- 
sense binding to a binding site "header" of 
this genomic record, designated A. 
Genomic record FIBRO encodes 3 short 
RNA segments, designated 1 , 2 and 4 
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respectively, which modulate expression of 
target genes designated GENE1 , GENE2 
and GENE4 respectively. Modulation of 
expression of these genes results in CELL 
A differentiating into a FIBROBLAST CELL. 

[0094] Reference is now made to Fig. 8 which is a 
simplified diagram illustrating a mode by 
which genes of a novel group of genes of 
the present invention, modulate 
expression of known target genes. 

[0095] T p ie nove | g enes of the present invention 
are micro RNA (miRNA)-like, regulatory 
RNA genes, modulating expression of 
known target genes. This mode of 
modulation is common to other known 
miRNA genes, as described hereinabove 
with reference to the background of the 
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invention section. 

[0096] GAM GENE and TARGET GENE are two 

human genes contained in the DNA of the 
human genome. 

[0097] GAM GENE encodes a GAM PRECURSOR 
RNA. However, similar to other miRNA 
genes, and unlike most ordinary genes, its 
RNA, GAM PRECURSOR RNA, does not 
encode a protein. 

[0098] GAM PRECURSOR RNA folds onto itself, 

forming GAM FOLDED PRECURSOR RNA. As 
Fig. 8 illustrates, GAM FOLDED PRECURSOR 
RNA forms a "hairpin structure", folding 
onto itself. As is well known in the art, this 
"hairpin structure", is typical genes of the 
miRNA genes, and is due to the fact that 
nucleotide sequence of the first half of the 



RNA of a gene in this group is an accurate 
or partial inversed- reversed sequence of 
the nucleotide sequence of its second half. 
By "inversed-reversed"is meant a sequence 
which is reversed and wherein each 
nucleotide is replaced by a complimentary 
nucleotide, as is well known in the art 
( e.g. ATGGC is the inversed-reversed 
sequence of GCCAT). 

[0099] 

An enzyme complex, designated DICER 
COMPLEX, "dices" the GAM FOLDED 
PRECURSOR RNA into a single stranded 
RNA segment, about 22 nucleotides long, 
designated GAM RNA. As is known in the 
art, "dicing" of the hairpin structured RNA 
precursor into shorter RNA segments 
about 22 nucleotides long by a Dicer type 
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enzyme is catalyzed by an enzyme 
complex comprising an enzyme called 
Dicer together with other necessary 
proteins. Nucleotide sequences of GAM 
PRECURSOR RNAs and of GAM RNAs, and a 
schematic representation of the secondary 
folding of GAM FOLDED PRECURSOR RNA 
are further described with reference to 
Table 1 . 

[01 00] TARGET GENE encodes a corresponding 

messenger RNA, designated TARGET RNA. 
This TARGET RNA comprises 3 regions: a 
5" untranslated region, a protein coding 
region and a 3" untranslated region, 
designated 5"UTR, PROTEIN CODING and 
3"UTR respectively. 

[01 01 ] q AM RNA bj nc j s complementarily a 
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BINDING SITE, located on the 3"UTR 
segment of TARGET RNA. This 
complementarily binding is due to the fact 
that the nucleotide sequence of GAM RNA 
is an accurate or partial inversed-reversed 
sequence of the nucleotide sequence of 
BINDING SITE. 

02] 

The complimentary binding of GAM RNA to 
BINDING SITE inhibits translation of 
TARGET RNA into TARGET PROTEIN. 
TARGET PROTEIN is therefore outlined by a 
broken line. Nucleotide sequences of 
target binding sites, such as BINDING SITE- 
I, BINDING SITE-II and BINDING SITE-IN of 
Fig. 8 and a schematic representation of 
the complementarity of each of these 
target binding sites to GAM RNA, are 
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described with reference to Table 2. 

[0103] 

It is appreciated by one skilled in the art 
that the mode of transcriptional inhibition 
illustrated by Fig. 8 with specific reference 
to GAM genes of the present invention, is 
in fact common to all other miRNA genes. 
A specific complimentary binding site has 
been demonstrated only for Lin-4 and Let- 
7. All the other 93 newly discovered miRNA 
genes are also believed by those skilled in 
the art to modulate expression of other 
genes by complimentary binding, although 
specific complimentary binding sites for 
these genes have not yet been found 
(Ruvkun C, "Perspective: Glimpses of a tiny 
RNA world", Science 294 ,779 (2001 )). The 
present invention discloses a novel group 
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of genes, the GAM genes, belonging to the 
miRNA genes group, and for which a 
specific an complimentary binding has 
been determined. Some GAM genes of the 
present invention are described in more 
detail with reference to Table 3. 

[01 04] Reference is now made to Fig. 9 which is a 
simplified block diagram illustrating a 
bioinformatic gene detection system 
capable of detecting genes of the novel 
group of genes of the present invention, 
which system is constructed and operative 
in accordance with a preferred 
embodiment of the present invention. 

[01 05]^ centerpiece of the present invention is a 
bioinformatic gene detection engine 100, 
which is a preferred implementation of a 
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mechanism capable of bioinformatically 
detecting genes of the novel group of 
genes of the present invention. 

[01 06] The function of the bioinformatic gene 
detection engine 100 is as follows: it 
receives three types of input, expressed 
RNA data 1 02, sequenced DNA data 1 04, 
and protein function data 106, performs a 
complex process of analysis of this data as 
elaborated below, and based on this 
analysis produces output of a 
bioinformatically detected group of novel 
genes designated 108. 

[01 07] Expressed RNA data 1 02 comprises 

published expressed sequence tags (EST) 
data, published mRNA data, as well as 
other sources of published RNA data. 
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Sequenced DNA data 1 04 comprises 
alphanumeric data describing sequenced 
genomic data, which preferably includes 
annotation data such as location of known 
protein coding regions relative to the 
sequenced data. Protein function data 106 
comprises scientific publications reporting 
studies which elucidated physiological 
function known proteins, and their 
connection, involvement and possible 
utility in treatment and diagnosis of 
various diseases. Expressed RNA data 1 02, 
sequenced DNA data 1 04 may preferably 
be obtained from data published by the 
National Center for Bioinformatics (NCBI) at 
the National Institute of Health (NIH), as 
well as from various other published data 
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sources. Protein function data 106 may 
preferably be obtained from any one of 
numerous relevant published data sources, 
such as the Online Mendelian Inherited 
Disease In Man (OMIM) database developed 
by John Hopkins University, and also 
published by NCBI. 

Prior to actual detection of 
bioinformatically detected novel genes 108 
by the bioinformatic gene detection engine 
1 00, a process of bioinformatic gene 
detection engine training & validation 
designated 1 1 0 takes place. This process 
uses the known miRNA genes as a training 
set (some 200 such genes have been found 
to date using biological laboratory means), 
to train the bioinformatic gene detection 
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engine 100 to bioinformatically recognize 
miRNA-like genes, and their respective 
potential target binding sites. 
Bioinformatic gene detection engine 
training & validation 1 1 0 is further 
describe hereinbelow with reference to Fig. 
10. 

[01 09] The bioinformatic gene detection engine 
100 comprises several modules which are 
preferably activated sequentially, and are 
described as follows: 

[01 1 0] A non _ coc jj n g genomic sequence detector 
1 1 2 operative to bioinformatically detect 
non-protein coding genomic sequences. 
The non-coding genomic sequence 
detector 1 1 2 is further described 
hereinbelow with reference to Figs. 1 1A 
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and 1 1 B. 

[01 1 1]A hairpin detector 1 14 operative to 

bioinformatically detect genomic "hairpin- 
shaped" sequences, similar to GAM 
FOLDED PRECURSOR of Fig. 8. The hairpin 
detector 1 1 4 is further described 
hereinbelow with reference to Figs. 1 2A 
and 1 2B. 

[01 1 2] A dicer-cut location detector 1 1 6 operative 
to bioinformatically detect the location on 
a hairpin shaped sequence which is 
enzymatically cut by DICER COMPLEX of 
Fig. 8. The dicer-cut location detector 1 1 6 
is further described hereinbelow with 
reference to Fig. 1 3A. 

[01 1 3] A target-gene binding-site detector 1 1 8 

operative to bioinformatically detect target 



genes having binding sites, the nucleotide 
sequence of which is partially 
complementary to that of a given genomic 
sequence, such as a sequence cut by DICER 
COMPLEX of Fig. 8. The target-gene 
binding-site detector 1 1 8 is further 
described hereinbelow with reference to 
Figs. 14A and 14B. 

[01 14] A function & utility analyzer 1 20 operative 
to analyze function and utility of target 
genes, in order to identify target genes 
which have a significant clinical function 
and utility. The function & utility analyzer 
1 20 is further described hereinbelow with 
reference to Fig. 1 5. 

[01 1 5] |_| arc j ware implementation of the 

bioinformatic gene detection engine 100 is 

68 



important, since significant computing 
power is preferably required in order to 
perform the computation of bioinformatic 
gene detection engine 100 in reasonable 
time and cost. As an example, it is 
estimated that using one powerful 8- 
processor PC Server, over 30 months of 
computing time (at 24 hours per day) 
would be required in order to detect all 
miRNA genes in human EST data, and their 
respective binding sites. 

For example, in order to address this 
challenge at reasonable time and cost, a 
preferred embodiment of the present 
invention may comprise a cluster of a large 
number of personal computers (PCs), such 
as 1 00 PCs (Pentium IV, 1 .7GHz, with 40GB 
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storage each), connected by Ethernet to 
several strong servers, such as 4 servers 
(2-CPU, Xeon 2.2GHz, with 200GB storage 
each), combined with an 8-processor 
server (8-CPU, Xeon 550Mhz w/ 8GB RAM) 
connected via 2 HBA fiber-channels to an 
EMC Clariion 100-disks, 3.6 Terabyte 
storage device. Additionally, preferably an 
efficient database computer program, such 
as Microsoft (TM) SQL-Server database 
computer program is used and is 
optimized to the specific requirements of 
bioinformatic gene detection engine 100. 
Furthermore, the PCs are preferably 
optimized to operate close to 1 00% CPU 
usage continuously, as is known in the art. 
Using suitable hardware and software may 
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preferably reduce the required calculation 
time in the abovementioned example from 
30 months to 20 days. 

[01 1 7] It is appreciated that the abovementioned 
hardware configuration is not meant to be 
limiting, and is given as an illustration 
only. The present invention may be 
implemented in a wide variety of hardware 
and software configurations. 

[01 1 S] T | ie p resent invention discloses 27238 
novel genes of the GAM group of genes, 
which have been detected 
bioinformatically, as described hereinbelow 
with reference to Figs. 24 through 27262. 
Laboratory confirmation of 4 genes of the 
GAM group of genes is described 
hereinbelow with reference to Figs. 21 A 
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through 23. 

[01 1 9] Reference is now made to Fig. 1 0 which is 
a simplified flowchart illustrating operation 
of a mechanism for training of a computer 
system to recognize the novel genes of the 
present invention. This mechanism is a 
preferred implementation of the 
bioinformatic gene detection engine 
training & validation 1 10 described 
hereinabove with reference to Fig. 9. 

[01 20] Bioinformatic gene detection engine 

training & validation 1 1 0 of Fig. 9 begins 
by training the bioinformatic gene 
detection engine to recognize known 
miRNA genes, as designated by numeral 
1 22. This training step comprises hairpin 
detector training & validation 1 24, further 
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described hereinbelow with reference to 
Fig. 1 2 A, dicer-cut location detector 
training & validation 1 26, further described 
hereinbelow with reference to Fig. 1 3A and 
1 3B, and target-gene binding-site detector 
training & validation 1 28, further described 
hereinbelow with reference to Fig. 14A. 

[01 21] Next, the bioinformatic gene detection 
engine 100 is used to bioinformatically 
detect sample novel genes, as designated 
by numeral 1 30. An example of a sample 
novel gene thus detected is described 
hereinbelow with reference to Fig. 21 . 

roi 221 

1 J Finally, wet lab experiments are preferably 
conducted in order to validate expression 
and preferably function the sample novel 
genes detected by the bioinformatic gene 
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detection engine 100 in the previous step. 
An example of wet-lab validation of the 
abovementioned sample novel gene 
bioinformatically detected by the system is 
described hereinbelow with reference to 
Figs. 22A and 22B. 

Reference is now made to Fig. 1 1 A which is 
a simplified block diagram of a preferred 
implementation of the non-coding 
genomic sequence detector 1 1 2 described 
hereinabove with reference to Fig. 9. Non- 
protein coding genomic sequence detector 
1 1 2 of Fig. 9 preferably receives as input 
at least two types of published genomic 
data: expressed RNA data 102, including 
EST data and mRNA data, and sequenced 
DNA data 1 04. After its initial training, 
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indicated by numeral 1 34, and based on 
the abovementioned input data, the non- 
protein coding genomic sequence detector 
1 1 2 produces as output a plurality of non- 
protein coding genomic sequences 1 36. 
Preferred operation of the non-protein 
coding genomic sequence detector 1 1 2 is 
described hereinbelow with reference to 
Fig. 11 B. 

24] Reference is now made to Fig. 1 1 B which is 
a simplified flowchart illustrating a 
preferred operation of the non-coding 
genomic sequence detector 1 1 2 of Fig. 9. 
Detection of non-protein coding genomic 
sequences to be further analyzed by the 
system generally preferably progresses in 
one of the following two paths. 
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[01 25] A first path for detecting non-protein 
coding genomic sequences begins by 
receiving a plurality of known RNA 
sequences, such as EST data. Each RNA 
sequence is first compared to all known 
protein-coding sequences, in order to 
select only those RNA sequences which are 
non-protein coding. This can preferably be 
performed by BLAST comparison of the 
RNA sequence to known protein coding 
sequences. The abovementioned BLAST 
comparison to the DNA preferably also 
provides the localization of the RNA on the 
DNA. 

[0126] 

Optionally, an attempt may be made to 
"expand" the non-protein RNA sequences 
thus found, by searching for transcription 
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start and end signals, upstream and 
downstream of location of the RNA on the 
DNA respectively, as is well known in the 
art. 

[01 27] A second path for detecting non-protein 
coding genomic sequences starts by 
receiving DNA sequences. The DNA 
sequences are parsed into non protein 
coding sequences, based on published 
DNA annotation data: extracting those 
DNA sequences which are between known 
protein coding sequences. Next, 
transcription start and end signals are 
sought. If such signals are found, and 
depending on their "strength", probable 
expressed non-protein coding genomic 
sequences are yielded. 
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[01 28] Reference is now made to Fig. 1 2A which is 
a simplified block diagram of a preferred 
implementation of the hairpin detector 1 14 
described hereinabove with reference to 
Fig. 9. 

[01 29] The goal of the hairpin detector 1 14 is to 
detect "hairpin" shaped genomic 
sequences, similar to those of known 
miRNA genes. As mentioned hereinabove 
with reference to Fig. 8, a "hairpin" 
genomic sequence refers to a genomic 
sequence which "folds onto itself forming 
a hairpin like shape, due to the fact that 
nucleotide sequence of the first half of the 
nucleotide sequence is an accurate or 

[01 30] T | ie na j rp j n detector 1 1 4 of Fig. 9 receives 
as input a plurality of non-protein coding 
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genomic sequences 1 36 of Fig. 1 1 A, and 
after a phase of hairpin detector training & 
validation 1 24 of Fig. 1 0, is operative to 
detect and output "hairpin shaped" 
sequences found in the input expressed 
non-protein coding sequences, designated 
by numeral 1 38. 

[01 31] The phase of hairpin detector training & 
validation 1 24 is an iterative process of 
applying the hairpin detector 1 14 to 
known hairpin shaped miRNA genes, 
calibrating the hairpin detector 1 14 such 
that it identifies the training set of known 
hairpins, as well as sequences which are 
similar thereto. Preferred operation of the 
hairpin detector 1 1 4 is described 
hereinbelow with reference to Fig. 12B. 
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[01 32] Reference is now made to Fig. 1 2B which is 
a simplified flowchart illustrating a 
preferred operation of the hairpin detector 
114 of Fig. 9. 

[01 33] A hairpin structure is a two dimensional 
folding structure, resulting from the 
nucleotide sequence pattern: the 
nucleotide sequence of the first half of the 
hairpin sequence is an inversed-reversed 
sequence of the second half thereof. 
Different methodologies are known in the 
art for detection of various two 
dimensional and three dimensional hairpin 
structures. 

[01 34] j n a p re f errec j embodiment of the present 
invention, the hairpin detector 1 14 initially 
calculates possible 2-dimensional (2D) 
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folding patterns of a given one of the non- 
protein coding genomic sequences 1 36, 
preferably using a 2D folding algorithm 
based on free-energy calculation, such as 
the Zucker algorithm, as is well known in 
the art. 

[01 35] Next, the hairpin detector 1 1 4 analyzes the 
results of the 2D folding, in order to 
determine the presence, and location of 
hairpin structures. A 2D folding algorithm 
typically provides as output a listing of the 
base-pairing of the 2D folded shape, i.e. a 
listing of which all two pairs of nucleotides 
in the sequence which will bond. The goal 
of this second step, is to asses this base- 
pairing listing, in order to determine if it 
describes a hairpin type bonding pattern. 
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[01 36] The hairpin detector 1 1 4 then assess those 
hairpin structures found by the previous 
step, comparing them to hairpins of known 
miRNA genes, using various parameters 
such as length, free-energy, amount and 
type of mismatches, etc. Only hairpins that 
bear statistically significant resemblance of 
the population of hairpins of known 
miRNAs, according to the abovementioned 
parameters are accepted. 

[01 37] Lastly, the hairpin detector 1 1 4 attempts 

to select those hairpin structures which are 
as stable as the hairpins of know miRNA 
genes. This may be achieved in various 
manners. A preferred embodiment of the 
present invention utilizes the following 
methodology comprising three steps: 
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[0138] 

First, the hairpin detector 1 14 attempts to 
group potential hairpins into "families" of 
closely related hairpins. As is known in the 
art, a free-energy calculation algorithm, 
typically provides multiple "versions" each 
describing a different possible 2D folding 
pattern for the given genomic sequence, 
and the free energy of such possible 
folding. The hairpin detector 1 14 therefore 
preferably assesses all hairpins found on 
all "versions", grouping hairpins which 
appear in different versions, but which 
share near identical locations into a 
common "family" of hairpins. For example, 
all hairpins in different versions, the center 
of which is within 7 nucleotides of each 
other may preferably be grouped to a 
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single "family". 

[0139] Next, hairpin "families" are assessed, in 
order to select only those families which 
represent hairpins that are as stable as 
those of known miRNA hairpins. For 
example, preferably only families which are 
represented in at least 65% of the free- 
energy calculation 2D folding versions, are 
considered stable. 

[01 40] Finally, an attempt is made to select the 
most suitable hairpin from each selected 
family. For example, preferably the hairpin 
which appears in more versions than other 
hairpins, and in versions the free-energy 
of which is lower, may be selected. 

[01 41 ] Reference is now made to Fig. 1 3A which is 
a simplified block diagram of a preferred 
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implementation of the dicer-cut location 
detector 1 1 6 described hereinabove with 
reference to Fig. 9. 

[01 42] The goal of the dicer-cut location detector 
1 1 6 is to detect the location in which 
DICER COMPLEX of Fig. 8, comprising the 
enzyme Dicer, would "dice" the given 
hairpin sequence, similar to GAM FOLDED 
PRECURSOR RNA, yielding GAM RNA both 
of Fig. 8. 

[01 43] T | ie dj cer _ cut location detector 1 1 6 of Fig. 
9 therefore receives as input a plurality of 
hairpins on genomic sequences 1 38 of Fig. 
1 2A, which were calculated by the previous 
step, and after a phase of dicer-cut 
location detector training & validation 1 26 
of Fig. 1 0, is operative to detect a 
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respective plurality of dicer-cut sequences 
from hairpins 140, one for each hairpin. 

[01 44] In a preferred embodiment of the present 
invention, the dicer-cut location detector 
1 1 6 preferably uses a combination of 
neural networks, Bayesian networks, 
Markovian modeling, and Support Vector 
Machines (SVMs) trained on the known 
dicer-cut locations of known miRNA genes, 
in order to detect dicer-cut locations. 
Dicer-cut location detector training & 
validation 1 26, which is further described 
hereinbelow with reference to Fig. 1 3B. 

[01 45] R e f erence j S now mac | e to Fig. 1 3 B which 
is a simplified flowchart illustrating a 
preferred implementation of dicer-cut 
location detector training & validation 1 26 

86 



i 

I 

of Fig. 1 0. Dicer-cut location detector 1 1 6 
first preprocesses known miRNA hairpins 
and their respective dicer-cut locations, so 
as to be able to properly analyze them and 
train the detection system accordingly: 

[01 46] The folding pattern is calculated for each 
known miRNA, preferably based on free- 
energy calculation, and the size of the 
hairpin, the size of the loop at the center 
of the hairpin, and "bulges" (i.e. 
mismatched base-pairs) in the folded 
hairpin are noted. 

[0147]-pri e dicer-cut location, which is known for 
known miRNA genes, is noted relative to 
the above, as well as to the nucleotides in 
each location along the hairpin. Frequency 
of identity of nucleotides, and nucleotide- 

87 



pairing, relative to their location in the 
hairpin, and relative to the known dicer- 
cut location in the known miRNA genes is 
analyzed and modeled. 

[01 48] Different techniques are well known in the 
art for analysis of existing pattern from a 
given "training set" of species belonging to 
a genus, which techniques are then 
capable, to a certain degree, to detect 
similar patterns in other species not 
belonging to the training-set genus. Such 
techniques include, but are not limited to 
neural networks, Bayesian networks, 
Support Vector Machines (SVM), Genetic 
Algorithms, Markovian modeling, and 
others, as is well known in the art. 

roi 491 

L J Using such techniques, preferably a 

88 



combination of several of the above 
techniques, the known hairpins are 
represented as a several different networks 
(such as neural, Bayesian, or SVM) input 
and output layers. Both nucleotide, and 
"bulge" (i.e. nucleotide pairing or 
mismatch) are represented for each 
position in the hairpin, at the input layer, 
and a corresponding true/false flag at each 
position, indicating whether it was diced by 
dicer at the output layer. Multiple networks 
are preferably used concurrently, and the 
results therefrom are integrated and 
further optimized. Markovian modeling 
may also be used to validate the results 
and enhance their accuracy. Finally, the 
bioinformatic detection of dicer-cut 
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location of a sample novel is confirmed by 
wet-lab experimentation. 

[01 50] Reference is now made to Fig. 1 4A which is 
a simplified block diagram of a preferred 
implementation of the target-gene 
binding-site detector 1 1 8 described 
hereinabove with reference to Fig. 9. The 
goal of the target-gene binding-site 
detector 11 8 is to detect a BINDING SITE of 
Fig. 8, located in an untranslated region of 
the RNA of a known gene, the nucleotide 
sequence of which BINDING SITE is at least 
partially complementary to that of a GAM 
RNA of Fig. 8, thereby determining that the 
abovementioned known gene is a target 
gene of GAM of Fig. 8. 

[01 51] T he target-gene binding-site detector 1 1 8 
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of Fig. 9 therefore receives as input a 
plurality of dicer-cut sequences from 
hairpins 1 40 of Fig. 1 3A which were 
calculated by the previous step, and a 
plurality of potential target gene 
sequences 142 which derive sequence DNA 
data 1 04 of Fig. 9, and after a phase of 
target-gene binding-site detector training 
& validation 1 28 of Fig. 1 0, is operative to 
detect target-genes having binding site/s 
144 the nucleotide sequence of which is at 
least partially complementary to that of 
each of the plurality of dicer-cut 
sequences from hairpins 140. Preferred 
operation of the target-gene binding-site 
detector is further described hereinbelow 
with reference to Fig. 14B. 
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[01 52] 

Reference is now made to Fig. 1 4B which is 
a simplified flowchart illustrating a 
preferred operation of the target-gene 
binding-site detector 1 1 8 of Fig. 9. In a 
preferred embodiment of the present 
invention, the target-gene binding-site 
detector 1 1 8 first performs a BLAST 
comparison of the nucleotide sequence of 
each of the plurality of dicer-cut 
sequences from hairpins 140, to the 
potential target gene sequences 142, in 
order to find crude potential matches. Blast 
results are then filtered to results which 
are similar to those of known binding sites 
(e.g. binding sites of miRNA genes Lin-4 
and Let-7 to target genes Lin-1 4, Lin-41 , 
Lin 28 etc.). Next the binding site is 
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expanded, checking if nucleotide 
sequenced immediately adjacent to the 
binding site found by BLAST, may improve 
the match. Suitable binding sites, then are 
computed for free-energy and spatial 
structure. The results are analyzed, 
selecting only those binding sites, which 
have free-energy and spatial structure 
similar to that of known binding sites. 

[01 53] 

Reference is now made to Fig. 1 5 which is 
a simplified flowchart illustrating a 
preferred operation of the function & utility 
analyzer 1 20 described hereinabove with 
reference to Fig. 9. The goal of the 
function & utility analyzer 1 20 is to 
determine if a potential target gene is in 
fact a valid clinically useful target gene. 
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Since a potential novel GAM gene binding a 
binding site in the UTR of a target gene is 
understood to inhibit expression of that 
target gene, and if that target gene is 
shown to have a valid clinical utility, then 
in such a case it follows that the potential 
novel gene itself also has a valid useful 
function which is the opposite of that of 
the target gene. 

[01 54] The function & utility analyzer 1 20 

preferably receives as input a plurality of 
potential novel target genes having 
binding-site/s 144, generated by the 
target-gene binding-site detector 1 1 8, 
both of Fig. 1 4A. Each potential gene, is 
evaluated as follows: 

[01 55] Fjrst t | ie svstem f jrst c hecks to see if the 
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function of the potential target gene is 
scientifically well established. Preferably, 
this can be achieved bioinformatically by 
searching various published data sources 
presenting information on known function 
of proteins. Many such data sources exist 
and are published as is well known in the 
art. 

Next, for those target genes the function 
of which is scientifically known and is well 
documented, the system then checks if 
scientific research data exists which links 
them to known diseases. For example, a 
preferred embodiment of the present 
invention utilizes the OMIM(TM) database 
published by NCBI, which summarizes 
research publications relating to genes 
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which have been shown to be associated 
with diseases. 

[01 57] Finally, the specific possible utility of the 

target gene is evaluated. While this process 
too may be facilitated by bioinformatic 
means, it might require human evaluation 
of published scientific research regarding 
the target gene, in order to determine the 
utility of the target gene to the diagnosis 
and or treatment of specific disease. Only 
potential novel genes, the target-genes of 
which have passed all three examinations, 
are accepted as novel genes. 

[01 58] R e f erence j S now mac | e to FIG. 1 6, which is 
a simplified diagram describing a novel 
bioinformatically detected group of 
regulatory genes, referred to here as 
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Genomic Record (GR) genes, that encode 
an "operon-like" cluster of novel miRNA- 
like genes, each modulating expression of 
a plurality of target genes, the function 
and utility of which target genes is known. 

[01 59] GR GENE (Genomic Record Gene) is gene of 
a novel, bioinformatically detected group 
of regulatory, non protein coding, RNA 
genes. The method by which GR is 
detected is described hereinabove with 
reference to FIGS. 6-1 5. 

[01 60] GR GENE encodes an RNA molecule, 

typically several hundred nucleotides long, 
designated GR PRECURSOR RNA. 

[01 61 ] GR PRECURSOR RNA folds spatially, as 

illustrated by GR FOLDED PRECURSOR RNA, 
into a plurality of what is known in the art 
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as "hair-pin" structures. The nucleotide 
sequence of GR PRECURSOR RNA 
comprises a plurality of segments, the first 
half of each such segment having a 
nucleotide sequence which is at least a 
partial inversed-reversed sequence of the 
second half thereof, thereby causing 
formation of a plurality of "hairpin" 
structures, as is well known in the art. 

[0162]GR FOLDED PRECURSOR RNA is naturally 
processed by cellular enzymatic activity, 
into 3 separate hairpin shaped RNA 
segments, each corresponding to GAM 
PRECURSOR RNA of Fig. 8, designated 
GAM1 PRECURSOR, GAM2 PRECURSOR and 
GAM 3 PRECURSOR respectively. 

[01 63] - T -| ie a b ove mentioned GAM precursors, are 
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diced by Dicer of FIG. 8, yielding short RNA 
segments of about 22 nucleotides in 
length, each corresponding to GAM RNA of 
FIG. 8, designated GAM1 , GAM 2 and GAM 3 
respectively. 

[0164]GAM1, GAM2 and GAM 3 each bind 

complementarily to binding sites located in 
untranslated regions of respective target 
genes, designated GAM1 -TARGET RNA, 
GAM 2 -TARGET RNA and GAM 3 -TARGET 
RNA respectively. This binding inhibits 
translation of the respective target proteins 
designated GAM1 -TARGET PROTEIN, 
GAM 2 -TARGET PROTEIN and GAMS- 
TARGET PROTEIN respectively. 

[0165] T ^ e s t ruc t ur e of GAM genes comprised in a 
GR GENE, and their mode of modulation of 
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expression of their respective target genes 
is described hereinabove with reference to 
Fig. 8. The bioinformatic approach to 
detection of GAM genes comprised in a GR 
GENE is described hereinabove with 
reference to Figs. 9 through 1 5. The 
present invention discloses 147 novel 
genes of the GR group of genes, which 
have been detected bioinformatically, as 
described hereinbelow with reference to 
Figs. 551 through 697. Laboratory 
confirmation of 3 genes of the GR group of 
genes is described hereinbelow with 
reference to Figs. 21 A through 23. 

In summary, the current invention 
discloses a very large number of novel GR 
genes, each of which encodes a plurality of 
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GAM genes, which in turn may modulate 
expression of a plurality of target proteins. 
It is appreciated therefore that the function 
of GR genes is in fact similar to that of the 
Genomic Records concept of the present 
invention addressing the differentiation 
enigma, described hereinabove with 
reference to Fig. 7. Some GR genes of the 
present invention are described in more 
detail with reference to Table 4. 

67] Reference is now made to Fig. 1 7 which is 
a simplified diagram illustrating a mode by 
which genes of a novel group of operon- 
like genes, described hereinabove with 
reference to Fig. 1 6 of the present 
invention, modulate expression of other 
such genes, in a cascading manner. 
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[01 68] GR1 GENE and GR2 GENE are two genes of 
the novel group of operon-like genes 
designated GR of Fig. 1 6. As is typical of 
genes of the GR group of genes, GR1 and 
GR2 each encode a long RNA precursor, 
which in turn folds into a folded RNA 
precursor comprising multiple hairpin 
shapes, and is cut into respective separate 
hairpin shaped RNA segments, each of 
which RNA segments being diced to yield a 
gene of a group of genes designated GAM 
of Fig. 8. In this manner GR1 yields GAM! , 
GAM2 and GAM 3, and GR2 yields GAM4, 
GAM 5 and GAM6. 

[0169] 

As Fig. 1 7 shows, GAM 3 which derives 
from GR1 , binds a binding site located 
adjacent to GR2 GENE, thus modulating 
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expression of GR2, thereby invoking 
expression of CAM4, GAM5 and GAM6 
which derive from GR2. 

It is appreciated that the mode of 
modulation of expression presented by 
Fig. 1 7 enables an unlimited "cascading 
effect" in which a GR gene comprises 
multiple GAM genes, each of which may 
modulate expression of other GR genes, 
each such GR gene comprising additional 
GAM genes, etc., whereby eventually 
certain GAM genes modulate expression of 
target proteins. This mechanism is in 
accord with the conceptual model of the 
present invention addressing the 
differentiation enigma, described 
hereinabove with specific reference to Figs. 
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6 and 7. 

[01 71 ] Reference is now made to Fig. 1 8 which is 
a block diagram illustrating an overview of 
a methodology for finding novel genes and 
operon-like genes of the present 
invention, and their respective functions. 

[01 72] According to a preferred embodiment of 
the present invention, the methodology to 
finding novel genes of the present 
invention and their function comprises of 
the following major steps: 

[01 73] First, genes of the novel group of genes of 
the present invention, referred to here as 
CAM genes, are located and their function 
elicited by detecting target proteins they 
bind and the function of those target 
proteins, as described hereinabove with 
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reference to Figs. 9 through 1 5. 

[01 74] Next, genes of a novel group of operon- 
like genes of the present invention, 
referred to here as GR genes, are located, 
by locating clusters of proximally located 
GAM genes, based on the previous step. 

[01 75] Consequently, the hierarchy of GR and 
GAM genes is elicited: binding sites for 
non-protein-binding GAM genes 
comprised in each GR gene found, are 
sought adjacent to other GR genes. When 
found, such a binding site indicates that 
the connection between the GAM and the 
GRthe expression of which it modulates, 
and thus the hierarchy of the GR genes and 
the GAM genes they comprise. 

[01 76] Lastly, the function of GR genes and GAM 
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genes which are "high" in the hierarchy, i.e. 
GAM genes which modulate expression of 
other GR genes rather than directly 
modulating expression of target proteins, 
may be deduced. A preferred approach is 
as follows: The function of protein- 
modulating GAM genes is deducible from 
the proteins which they modulate, 
provided that the function of these target 
proteins are known. The function of 
"higher" GAM genes may be deduced by 
comparing the function of protein- 
modulating GAM genes, with the 
hierarchical relationships by which the 
"higher" GAM genes are connected to the 
protein-modulating GAM genes. For 
example, given a group of several protein- 
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modulating GAM genes, which collectively 
cause a protein expression pattern typical 
of a certain cell-type, then a "higher" GAM 
gene is sought which modulates 
expression of GR genes which perhaps 
modulate expression of other genes which 
eventually modulate expression of the 
given group of protein-modulating GAM 
genes. The "higher" GAM gene found in 
this manner, is taken to be responsible for 
differentiation of that cell-type, as per the 
conceptual model of the invention 
described hereinabove with reference to 
Fig. 6. 

Reference is now made to Fig. 1 9 which is 
a block diagram illustrating different 
utilities of genes of the novel group of 
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genes of the present invention referred to 
here as GAM genes and GR genes. 

The present invention discloses a first 
plurality of novel genes referred to here as 
GAM genes, and a second plurality of 
operon-like genes referred to here as GR 
genes, each of the GR genes encoding a 
plurality of GAM genes. The present 
invention further discloses a very large 
number of known target-genes, which are 
bound by, and the expression of which is 
modulated by each of the novel genes of 
the present invention. Published scientific 
data referenced by the present invention 
provides specific, substantial, and credible 
evidence that the abovementioned target 
genes modulated by novel genes of the 
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present invention, are associated with 
various diseases. Specific novel genes of 
the present invention, target genes thereof 
and diseases associated therewith, are 
described hereinbelow with reference to 
Figs. 24 through 27260. It is therefore 
appreciated that a function of GAM genes 
and GR genes of the present invention is 
modulation of expression of target genes 
related to known diseases, and that 
therefore utilities of novel genes of the 
present invention include diagnosis and 
treatment of the abovementioned diseases. 
Fig. 1 9 describes various types of 
diagnostic and therapeutic utilities of novel 
genes of the present invention. 

79] 

A utility of novel genes of the present 
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invention is detection of GAM genes and of 
GR genes. It is appreciated that since GAM 
genes and GR genes modulate expression 
of disease related target genes, that 
detection of expression of GAM genes in 
clinical scenarios associated with said 
diseases is a specific, substantial and 
credible utility. Diagnosis of novel genes of 
the present invention may preferably be 
implemented by RNA expression detection 
techniques, including but not limited to 
biochips, as is well known in the art. 
Diagnosis of expression of genes of the 
present invention may be useful for 
research purposes, in order to further 
understand the connection between the 
novel genes of the present invention and 
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the abovementioned related diseases, for 
disease diagnosis and prevention 
purposes, and for monitoring disease 
progress. 

Another utility of novel genes of the 
present invention is anti-GAM gene 
therapy, a mode of therapy which allows 
up regulation of a disease related target- 
gene of a novel GAM gene of the present 
invention, by lowering levels of the novel 
GAM gene which naturally inhibits 
expression of that target gene. This mode 
of therapy is particularly useful with 
respect to target genes which have been 
shown to be under-expressed in 
association with a specific disease. Anti- 
GAM gene therapy is further discussed 



in 



hereinbelow with reference to Figs. 20A 
and 20B. 

A further utility of novel genes of the 
present invention is GAM replacement 
therapy, a mode of therapy which achieves 
down regulation of a disease related 
target-gene of a novel GAM gene of the 
present invention, by raising levels of the 
GAM gene which naturally inhibits 
expression of that target gene. This mode 
of therapy is particularly useful with 
respect to target genes which have been 
shown to be over-expressed in association 
with a specific disease. GAM replacement 
therapy involves introduction of 
supplementary GAM gene products into a 
cell, or stimulation of a cell to produce 
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excess GAM gene products. GAM 
replacement therapy may preferably be 
achieved by transfecting cells with an 
artificial DNA molecule encoding a GAM 
gene, which causes the cells to produce 
the GAM gene product, as is well known in 
the art. 

82] 

Yet a further utility of novel genes of the 
present invention is modified GAM therapy. 
Disease conditions are likely to exist, in 
which a mutation in a binding site of a 
GAM gene prevents natural GAM gene to 
effectively bind inhibit a disease related 
target-gene, causing up regulation of that 
target gene, and thereby contributing to 
the disease pathology. In such conditions, 
a modified GAM gene is designed which 
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effectively binds the mutated GAM binding 
site, i.e. is an effective anti-sense of the 
mutated GAM binding site, and is 
introduced in disease effected cells. 
Modified GAM therapy is preferably 
achieved by transfecting cells with an 
artificial DNA molecule encoding the 
modified GAM gene, which causes the cells 
to produce the modified GAM gene 
product, as is well known in the art. 

An additional utility of novel genes of the 
present invention is induced cellular 
differentiation therapy. As aspect of the 
present invention is finding genes which 
determine cellular differentiation, as 
described hereinabove with reference to 
Fig. 18. Induced cellular differentiation 
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therapy comprises transfection of cell with 
such GAM genes thereby determining their 
differentiation as desired. It is appreciated 
that this approach may be widely 
applicable, inter alia as a means for auto 
transplantation harvesting cells of one 
cell-type from a patient, modifying their 
differentiation as desired, and then 
transplanting them back into the patient. It 
is further appreciated that this approach 
may also be utilized to modify cell 
differentiation in vivo, by transfecting cells 
in a genetically diseased tissue with a cell- 
differentiation determining GAM gene, 
thus stimulating these cells to differentiate 
appropriately. 

84] 

Reference is now made to Figs. 20A and 
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20B, simplified diagrams which when taken 
together illustrate anti-GAM gene therapy 
mentioned hereinabove with reference to 
Fig. 1 9. A utility of novel genes of the 
present invention is anti-GAM gene 
therapy, a mode of therapy which allows 
up regulation of a disease related target- 
gene of a novel GAM gene of the present 
invention, by lowering levels of the novel 
GAM gene which naturally inhibits 
expression of that target gene. Fig. 20A 
shows a normal GAM gene, inhibiting 
translation of a target gene of GAM gene, 
by binding to a BINDING SITE found in an 
untranslated region of TARGET RNA, as 
described hereinabove with reference to 
Fig. 8. 
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[01 85] Fig. 20B shows an example of anti-GAM 
gene therapy. ANTI-GAM RNA is short 
artificial RNA molecule the sequence of 
which is an anti-sense of GAM RNA. Anti- 
GAM treatment comprises transfecting 
diseased cells with ANTI-GAM RNA, or with 
a DNA encoding thereof. The ANTI-GAM 
RNA binds the natural GAM RNA, thereby 
preventing binding of natural GAM RNA to 
its BINDING SITE. This prevents natural 
translation inhibition of TARGET RNA by 
GAM RNA, thereby up regulating 
expression of TARGET PROTEIN. 

[01 86] 

It is appreciated that anti-GAM gene 
therapy is particularly useful with respect 
to target genes which have been shown to 
be under-expressed in association with a 
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specific disease. 

[0187] Reference is now made to Fig. 21 A which is 
an annotated sequence of an EST 
comprising a novel gene detected by the 
gene detection system of the present 
invention. Fig. 21 A shows the nucleotide 
sequence of a known human non-protein 
coding EST (Expressed Sequence Tag), 
identified as EST72223. It is appreciated 
that the sequence of this EST comprises 
sequences of one known miRNA gene, 
identified as MIR98, and of one novel GAM 
gene, referred to here as GAM24, detected 
by the bioinformatic gene detection system 
of the present invention, described 
hereinabove with reference to Fig. 9. 

rOl 881 

L Reference is now made to Figs. 21 B and 



118 



21 C that are pictures of laboratory results, 
Which when taken together demonstrate 
laboratory confirmation of expression of 
the bioinformatically detected novel gene 
of Fig. 21 A. Reference is now made to Fig. 
21 B which is a Northern blot analysis of 
MIR-98 and EST72223 transcripts. MIR-98 
and EST72223 were reacted with MIR-98 
and GAM24 probes as indicated in the 
figure. It is appreciated that the probes of 
both MIR-98 and GAM 2 4 reacted with 
EST72223, indicating that EST72223 
contains the sequences of MIR-98 and of 
GAM24. It is further appreciated that the 
probe of GAM24 does not cross-react with 
MIR-98. 

89] 

Reference is now made to Fig. 21C. A 
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Northern blot analysis of EST72223 and 
MIR-98 transfections were performed, 
subsequently marking RNA by the MIR-98 
and GAM24 probes . Left, Northern reacted 
with MIR-98, Right, Northern reacted with 
GAM24. The molecular Sizes of EST72223, 
MIR-98 and GAM24 are indicated by 
arrows. Hela are control cells that have not 
been introduced to exogenous RNA. EST 
and MIR-98 Transfections are RNA 
obtained from Hela transfected with 
EST72223 and MIR-98, respectively. MIR- 
98 and EST are the transcripts used for the 
transfection experiment. The results 
indicate that EST72223, when transfected 
into Hela cells, is cut yielding known 
miRNA gene MIR-98 and novel miRNA 
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gene GAM24. 

[0190] 

Reference is now made to Fig. 21 D, which 
is a Northern blot of a lisate experiment 
with MIR-98 and GAM24. Northern blot 
analysis of hairpins in EST72223 . Left, 
Northern reacted with predicted Mir-98 
hairpin probe, Right, Northern reacted with 
predicted GAM24 hairpin probe. The 
molecular size of EST Is indicated by arrow. 
The molecular sizes of Mir-98 and GAM24 
are 80nt and 1 OOnt, respectively as 
indicated by arrows. The 22nt molecular 
marker is indicated by arrow. 1 -Hela 
lysate; 2-EST incubated 4h with Hela 
lysate; 3-EST without lysate; 4-Mir 
transcript incubated 4h with Hela lysate; 
5-Mir transcript incubated overnight with 
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Hela lysate; 6- Mir transcript without 
lysate; 7-RNA extracted from Hela cells 
following transfection with Mir transcript. 

[01 91 ] Technical methods used in experiments, 
the results of which are depicted in Figs. 
21 B, 21 C and 21 D are as follows: 

F01921 

1 1 Transcript preparations :D\gox\ger\\n (DIG) lab< 
transcripts were prepared from EST72223 (TIC 
and predicted precursor hairpins by using a D 
labeling kit (Roche Molecular Biochemicals) ac 
manufacture'^ protocol. Briefly, PCR products 
promoter at the 5" end or T3 promoter at the 
prepared from each DNA in order to use it as 
prepare sense and antisense transcripts, resp* 
98 was amplified using EST72223 as a temple 
T7miR98 forward primer: 5- 
"TAATACGACTCACTATAGGGTGAGGTAGTAAG 
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3"and T3miR98 revse primer: 5"- 
AATTAACCCTCACTAAAGGGAAAGTAGTAAGT1 
3 M EST72223 was amplified with T7-EST 7222] 
primer:5"- 

TAATACG ACTCACTATAG G CCCTTATTAG AG CA 
3"and T3-EST72223 reverse primer:5"- 
AATTAACCCTCACTAAAGG I I I I I I I I I CCTGA< 
Bet-4 was amplified using EST72223 as a tern 
4 forward primer: 5 M -GAGGCAGGAGAATTGCT 
T3-EST72223 reverse primer: 5"- 
AATTAACCCTCACTAAAGGCCTGAGACAGAGTC 

[01 93] The PCR products were cleaned and used 
for DIG-labeled or unlabeled transcription 
reactions with the appropriate polymerase. 
For transfection experiments, CAP reaction 
was performed by using a mMassage 
mMachine kit (Ambion). 
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[01 94] 

Transfection procecfure.Transfect'ion of 
Hela cells was performed by using 
TransMessenger reagent (Qiagen) 
according to the manufacture"s protocol. 
Briefly, Hela cells were seeded to 1 -2x 
1 0 A 6 cells per plate a day before 
transfection. Two M9 RNA transcripts were 
mixed with 8[}\ Enhancer in a final volume 
of 1 00^1, mixed and incubated at room 
temperature for 5 min. 16|jl 
TransMessenger reagent was added to the 
RNA-Enhancer, mixed and incubated for 
additional 10 min. Cell plates were washed 
with sterile PBS twice and then incubated 
with the transfection mix diluted with 2.5 
ml DMEM medium without serum. Cells 
were incubated with transfection mix for 
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three hours under their normal growth 
condition (370C and 5% C02) before the 
transfection mix was removed and a fresh 
DMEM medium containing serum was 
added to the cells. Cells were left to grow 
48 hours before harvesting. 

Target RNA cleavage <?555)'Cap-labeled 

target RNAs were generated using 

TM 

mMessage mMachine (Ambion). Caped 

0 

RNA transcripts were preincubated at 30 C 
for 1 5 min in supplemented Hela SI 00 

obtained from Computer Cell Culture, Mos, 

Belgium. After addition of all components, 

final concentrations were 1 OOmM target 

RNA, 1 m M ATP, 0.2mM CTP, 1 OU/ml 

RNasin, 30ug/ml creatine kinase, 25mM 

creatine phosphate, and 50% SI 00 extract. 
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Incubation was continued for 4 hours to 
overnight. Cleavage reaction was stopped 
by the addition of 8 volumes of proteinase 
K buffer (200Mm Tris-Hcl, pH 7.5, 25m M 
EDTA, 300mM NaCI, and 2% SDS). 
Proteinase K, dissolved in 50mM Tris-HCI, 
pH 8, 5m M CaCI2, and 50% glycerol, was 
added to a final concentration of 0.6 
mg/ml. Sample were subjected to 
phenol/chlorophorm extraction and kept 
frozen until analyzed by urea-TBE PAGE. 

Northern analysis. .RNAs were extracted 
from cells by using Tri-reagent according 
to the manufacture'^ protocol. The RNAs 
were dissolved in water and heated to 
650C to disrupt any association of the 
25nt RNA with larger RNA molecules. RNA 
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were placed on ice and incubated for 30 
min with PEG (MW=8000) in a final 
concentration of 5% and NaCI in a final 
concentration of 0.5M to precipitate high 
molecular weight nucleic acid. The RNAs 
were centrifuged at 1 0,000xg for 1 0 min 
to pellet the high molecular weight nucleic 
acid. The supernatant containing the low 
molecular weight RNAs was collected and 
three volumes of ethanol was added. The 
RNAs were placed at -200C for at least two 
hours and then centrifuged at 1 0,000xg 
for 1 0 min. The pellets were dissolved in 
Urea-TBE buffer (lXtbe, 7M urea) for 
further analysis by a Northern blot. 

197] 

RNA samples were boiled for 5 min before 
loading on 1 5%-8% polyacrylamide (19:1) 
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gels containing 7M urea and 1 xTBE. Gels 
were run in 1 xTBE at a constant voltage of 
300V and then transferred into a nylon 
membrane. The membrane was exposed to 
3min ultraviolet light to cross link the 
RNAs to the membrane. Hybridization was 
performed overnight with DIG-labeled 
probes at 420C. Membranes were washed 
twice with SSCx2 and 0.2% SDS for 1 0 min. 
at 420C and then washed twice with 
SSCx0.5 for 5 min at room temperature. 
The membrane was then developed by 
using a DIG luminescent detection kit 
(Roche) using anti DIG and CSPD reaction, 
according to the manufacture'^ protocol. 

It is appreciated that the data presented in 
Figs. 2 1 A, 2 1 B, 2 1 C and 2 1 D, when taken 
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together validate the function of the 
bioinformatic gene detection engine TOO of 
Fig. 9. Fig. 21 A shows a novel GAM gene 
bioinformatically detected by the 
bioinformatic gene detection engine 100, 
and Figs. 21 B, 21 C and 21 D show 
laboratory confirmation of the expression 
of this novel gene. This is in accord with 
the engine training and validation 
methodology described hereinabove with 
reference to Fig. 1 0. 

[0199] 

Reference is now made to Fig. 22A which is 
an annotated sequence of an EST 
comprising a novel gene detected by the 
gene detection system of the present 
invention. Fig. 22A shows the nucleotide 
sequence of a known human non-protein 
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coding EST (Expressed Sequence Tag), 
identified as EST 7929020. It is 
appreciated that the sequence of this EST 
comprises sequences of two novel GAM 
genes, referred to here as GAM23 and 
GAM25, detected by the bioinformatic 
gene detection system of the present 
invention, described hereinabove with 
reference to Fig. 9. 

[0200] 

Reference is now made to Fig. 22B which 
presents pictures of laboratory results, that 
demonstrate laboratory confirmation of 
expression of the bioinformatically 
detected novel gene of Fig. 22A. Northern 
blot analysis of hairpins in EST7929020. 
Left, Northern reacted with predicted 
GAM25 hairpin probe, Right, Northern 
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reacted with predicted GAM23 hairpin 
probe. The molecular size of EST is 
indicated by arrow. The molecular sizes of 
GAM 2 3 and GAM25 are 60nt, as indicated 
by arrow. The 22nt molecular marker is 
indicated by arrow. 1-Hela lysate; 2- EST 
incubated 4h with Hela lysate ; 3- EST 
incubated overnight with Hela lysate; 4- 
EST without lysate; 5-GAM transcript; 6- 
GAM 22nt marker; 7-GAM PCR probe; 8- 
RNA from control Hela cells; 9-RNA 
extracted from Hela cells following 
transfection with EST. 

I 

Reference is now made to Fig. 22C which is 
a picture of a Northern blot confirming 
Endogenous expression of 
bioinformatically detected gene GAM25 of 
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Fig. 22A from in Hela cells. Northern was 
reacted with a predicted GAM25 hairpin 
probe. The molecular size of EST7929020 
is indicated. The molecular sizes of GAM25 
is 58nt, as indicated. A 1 9nt DNA oligo 
molecular marker is indicated. Endogenous 
expression of GAM25 in Hela total RNA 
fraction and in S-100 fraction is indicated 
by arrows. 1 -GAM25 transcript; 2- GAM25 
DNA oligo marker; 3-RNA from control 
Hela cells; 4-RNA extracted from Hela cells 
following transfection with EST; 5- RNA 
extracted from S-100 Hela lysate. 

[0202] 

Reference is now made to Fig. 23A which is 
an annotated sequence of an EST 
comprising a novel gene detected by the 
gene detection system of the present 
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invention. Fig. 23A shows the nucleotide 
sequence of a known human non-protein 
coding EST (Expressed Sequence Tag), 
identified as EST 1 388749. It is 
appreciated that the sequence of this EST 
comprises sequence of a novel GAM gene, 
referred to here as GAM26, detected by the 
bioinformatic gene detection system of the 
present invention, described hereinabove 
with reference to Fig. 9. 

[0203] 

Reference is now made to Fig. 23B which is 
a picture of Northern blot analysis, 
confirming expression of novel 
bioinformatically detected gene GAM26, 
and natural processing thereof from 
EST1 388749. Northern reacted with 
predicted GAM26 hairpin probe. The 
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molecular size of EST is indicated by arrow. 
The molecular sizes of GAM26 is 1 30nt, as 
indicated by arrow. The 22nt molecular 
marker is indicated by arrow. 1 -Hela 
lysate; 2-EST incubated 4h with Hela 
lysate; 3- EST incubated overnight with 
Hela lysate; 4-EST without lysate; 5-GAM 
transcript; 6- GAM 22nt marker; 7-GAM 
PCR probe. 

[0204] 

It is appreciated by persons skilled in the 
art that the present invention is not limited 
by what has been particularly shown and 
described hereinabove and in Tables 1-4. 
Rather the scope of the present invention 
includes both combinations and 
subcombinations of the various features 
described hereinabove as well as variations 
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and modifications which would occur to 
persons skilled in the art upon reading the 
specifications and which are not in the 
prior art. 

[0205] A bibliography is included with reference 
to Table 5. 
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